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Abstract 

We  consider  the  problem  of  nonparametric  regression.  The  aim  is  to  get  a smooth  function 
which  represents  the  dataset  and  has  a reasonable  number  of  extreme  values.  An  iterative 
method,  the  QSOR  method  is  introduced.  Problems  with  the  slow  convergence  of  the 
method  are  reduced  using  multigrid  techniques. 


1 Introduction 

Given  a dataset  {y{U),i  = 1, . . . , n]  which  we  denote  by  y,  we  look  for  a decomposition 
y{U)  = f{U)  + r(ti),  ( U = i/n,  i = 1, . . . ,n) 

where  / is  a simple  function  and  the  {r(tj),(i  = l,...,n)}  are  the  resulting  residuals 
which  approximate  white  noise.  We  use  two  different  concepts  of  simplicity.  The  first 
is  the  number  of  local  extreme  values.  The  second  is  the  smoothness  of  the  function  as 
measured  by  the  standard  smoothness  functional 

S(f)  ■■=  [\f™(t))2dt , 

Jo 

where  / (2)  is  the  second  derivative  of  /.  The  number  of  local  extremes  is  taken  to  have  pri- 
ority over  smoothness.  The  number  of  local  extremes  and  their  locations  are  determined 
by  the  taut  string  method  developed  in  [3].  This  is  described  briefly  in  the  next  section. 
The  residuals  are  required  to  look  like  white  noise  in  the  sense  that  the  means  over  certain 
dyadic  intervals  are  required  to  lie  within  given  bounds  [3].  The  multiresolution  coef- 
ficients for  (n  = 2")  are  defined  by:  wtj  2^-’/2 ^Ylk=j2?+ir(tk),(i  — 0, = 

0, - 1).  The  multiresolution  condition  now  requires  that  -c„  < Wij  < cn, 
where  cn  represents  some  form  of  thresholding.  The  default  value  of  c„  which  we  use  is 
cn  = an i/2.51og(n)  where  an  = 1.482  • median(|y2  — 2/i | , — , \yn  ~ yn-i\)/V2. 
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Fig.  1.  The  top-left  caption  shows  the  original  doppler  function  and  the  top-right 
caption  shows  the  noisy  version.  The  bottom-left  caption  shows  the  result  of  the  taut 
string  algorithm  with  the  resulting  residuals  being  shown  in  the  bottom-right  caption. 

2 Taut  string 

A short  description  of  the  taut  string  method  is  as  follows.  We  write  / = (/j, . . . , fn)T 

f(tn))T  e Rn  and  denote  the  cumulative  sums  of  y and  f by  Y and  F 
respectively,  Yt  = Vh  F*  = Yl)= i fv  (*  = 0,  • • • , n),  with  Y0  = F0  = 0.  We  specify 
bounds  defined  by  A = (Ai, . . . , A„)T  G R”  and  consider  the  tube 

{G:|F-G|<A}.  (2.1) 

The  taut  string  V(A)  is  now  the  function  defined  by  a taut  string  attatched  to  the 
points  (0,  Y0)  and  ( n,Yn ) and  constrained  to  lie  within  the  tube  (2.1).  It  can  be  shown 
that  the  taut  string  minimizes  the  number  of  extreme  values  of  the  functions  g whose 
cumulative  sums  G lie  within  the  tube.  The  taut  string  is  continuous  and  piecewise 
linear.  Its  derivative  v(X)  is  taken  as  a candidate  regression  functions.  The  vector  A is 
determined  in  a data  dependent  manner  by  the  requirement  that  the  residuals  associated 
with  n(A)  {r(A)i  = Vi  — v(X )i,i  = l,.,.,n}  satisfy  the  multiresolution  condition.  If 
such  a condition  fails  on  an  interval  then  the  A-values  associated  with  that  interval 
are  reduced  in  size.  An  application  of  the  taut  string  method  to  the  doppler  data  of 
Donoho  and  Johnstone  (see  e.g.  [4])  is  shown  in  Figure  1.  The  function  is  defined  by 

f(t ) = 21y/(n(l  - t))sin  (2-k  *+^05)  ■ The  derivative  v(A)  is  piecewise  constant  as  may 
be  seen  from  Figure  1.  The  function  v(X)  determines  the  number  of  local  extremes.  We 
take  the  midpoints  of  the  intervals  associated  with  a local  extremes  as  the  locations  of 
the  local  extremes  for  the  smoothing  algorithm. 
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3 The  smoothing  problem 

We  make  the  smoothing  problem  precise  as  follows.  The  number,  locations  and  type  of 
extreme  values  are  taken  from  the  taut  string  as  explained  in  the  last  section.  We  further 
require  the  function  / to  lie  in  the  tube  determined  by  the  taut  string.  This  is  to  prevent 
the  smoothing  procedure  from  moving  too  far  from  the  data.  These  restrictions  may  be 
described  in  the  form 

Af  > b (3.1) 

for  an  appropriate  matrix  A and  vector  b.  This  leads  to  the  following  problem: 
minimize  £"=1(/t+i  - 2/*  + /)_ i)2  subject  to  (3.1), 

or  equivalently 

minimize  FtQ3F  subject  to  (3.1), 

for  some  quadratic  form  Q3.  We  denote  this  latter  quadratic  programming  problem  by 
QP3.  Clearly  the  matrix  associated  with  the  quadratic  form  X2”=i(/i+i  - 2/j  + fi-1)2 
is  singular.  Nevertheless  the  solution  of  QP3  may  be  unique.  We  have  the  following 
theorem. 

Theorem  3.1  Let  V(\)  be  the  result  of  the  taut  string  method.  Assume  that  U(A)  has 
one  extreme  value.  We  define  the  bounds  L , U by 


L:=Y  -\,U  :=Y  + \. 

Let  Fi , F2  be  two  solutions  of  the  corresponding  quadratic  program.  Additionally  let  Fi 
touch  three  bounds  alternately 


Then 


(i.e.  Ui^Li^Uis  or  Li^Vi^Li^iii  <i2<  h)  are  active). 


F\  = F2. 


We  call  a problem  with  a unique  solution  a nondegenerate  problem.  From  now  on  we 
assume  that  our  problem  is  nondegenerate. 


3.1  Quadratic  programming 

There  are  many  algorithms  which  solve  quadratic  programming  problems  directly.  Un- 
fortunately most  of  them  are  expensive  in  terms  of  memory  requirements  and  are  not 
feasible  for  data  sets  of  the  order  say  n = 8196.  To  overcome  this  we  look  for  iterative 
methods  which  converge  to  the  solution.  Gradient  projection  methods  (e.g.  as  defined 
in  [8],  [2]  or  [9])  are  not  appropriate  for  this  purpose  as  the  monotonicity  constraints 
make  the  projection  into  the  feasible  set  too  expensive.  Instead  we  use  a modified  ver- 
sion of  the  QSOR  (quasi  successive  over  relaxation)  method  developed  by  Metzner  in 
[7].  QSOR  is  a very  cheap  iteration  and  converges  to  the  solution  of  QP3.  Unfortunately 
the  convergence  is  very  slow  on  sections  where  the  solution  is  smooth.  To  overcome  this 
we  use  multigrid  methods  which  have  to  be  adapted  to  our  requirements. 
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4 QSOR 

The  QSOR  algorithm  is  an  iterative  method  which  produces  a feasible  sequence  {Fk}kL0 
converging  towards  the  solution  of  QP3.  For  simplicity,  we  describe  the  iteration  only  for 
a convexity  interval.  Let  F°  G Rn  be  an  arbitrary  feasible  vector.  The  obvious  candidate 
is  the  derivative  of  the  taut  string.  Let  Q = Qs  and  ui  G (0,2).  The  following  defines  a 
QSOR  iteration. 

• While  convergence  not  achieved 

• F = Fk 

2 = 1 

F = F<  - U = max{2Fi+i  - Fi+2,  Li},  Ui  = Ui,  F = med  {Lu  Ui,  F } 

2 = 2 

F = Fi-  Qz)a , U = max{2Fi+1  - Fi+ 2,  Li} 

Ui  = min{(F)+i  + F_ i)/2,  Ui},  F = med {F,  Uh  Fi} 

• for  (i  in  3:(n-2)){ 

Fi  = Fi  - ^t( Qz)i , U = max{2Fi+i  - Fi+2 , 2F_X  - F_2,  F} 

Ui  = min{(F)+i  + F_ i)/2,  Z7*},  F = med{F,  Ui,  F} 
if  (i  active)  mark  i 
} 

} 

1 — n 

F = F2  - ^-( Qz)u,Li  = max{2F-i  - F_2,F},F  = E/i,F  = med{F,F,F} 

2 = 1 

Fi  = Fi-1*ri(Qz)i 

Li  = max(2Ft^_i  Fj_|_2,.F} 

Fi  = UiFi  = med{F,  Ui,  F} 

• correct  the  active  intervals: 

* Let  [F,F+fc]  be  an  active  Interval:  F = F + t ^"1*^  (F+fc  — F)-  Denoting 
the  2-th  unit  vector  in  Mn  by  e,  and  a,  b defined  by 

6 E£(gfli  0 TEZuiQzV 

v+k  v+k  , v+k  v+k 

Z-~dl  — V 1 

set  Fj'  :=  Fj  — a(atj  + 6)  with 

• F*  = F for  all  2 in  other  intervals 

Theorem  4.1  (convergence)  Let  (Ffc)£T0  be  the  sequence  in  Rn  produced  by  the  QSOR 
algorithm  and  let  the  problem  QP3  be  nondegenerate.  Then 

• {Fk)kLo  converges  in  M”. 

• F*  :=  lim  Fk  is  the  solution  of  QP3. 
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Fig.  2.  The  captions  top-left,  top-right,  bottom-left,  bottom-right  show  the  result  of 
the  QSOR  iteration  for  the  doppler  data  (n  = 2048)  after  1000, 5000, 10000, 20000  steps 
respectively. 

The  slowness  of  the  convergence  can  be  seen  by  the  fact  that  the  doppler  data  of  Figure  1 
required  two  million  iterations  before  a satisfactory  solution  was  obtained.  This  is  shown 
in  Figure  2.  After  a small  number  of  iterations  the  solution  does  not  change  any  more 
on  the  “left  side”  where  the  function  oscilates  rapidly.  After  1000  iterations  of  QSOR 
(which  is  fast  because  one  QSOR  step  is  very  cheap!)  the  solution  looks  very  smooth 
except  for  a few  “buckles”  on  the  “right  side”  of  the  solution.  The  method  needs  many 
iterations  (up  to  two  million)  to  reach  an  adequate  smoothness.  The  slowness  of  the 
convergence  is  known  from  the  original  SOR  method  for  solving  linear  equations.  In  the 
standard  case  of  solving  linear  equations  multigrid  methods  can  be  used  to  speed  up  the 
rate  of  convergence.  We  now  apply  this  idea  to  solving  the  problem  QP3. 


5 Multigrid  QSOR 

Multigrid  techniques  are  general  techniques  to  speed  up  iterative  methods  which  indeed 
have  other  good  properties.  The  ideas  are  given  for  example  in  [1]  or  [5].  We  will  give  here 
a short  description  of  the  multgrid  idea  in  our  case.  First  some  notation.  Given  a grid 
G = Gf  = (<i, . . . ,tn)  we  define  the  coarse  grid  Gc  = (ti,  tj2, . . . , tim_1,tn),ii  = l,im  = 
n with  ij  € {1,  We  define  the  projection  down  Icx  = (Fi,Fi2, . . . , Fim_1,  Fn)T 

and  the  projection  up  Icx  — y where  yi  = Fi  (l  e {ij\j  = 1 , . . . , rn } ) . and  by  linear 
interpolation  elsewhere,  i.e., 


Vi 


F-  - F 

ch  r'i- 1 

Uj  ~ Uj-i 


(ij- 1 < l < ij). 


We  define  now  the  multigrid  QSOR  with  only  two  grids,  i.e.,  of  level  two.  The  general 
case  of  level  v e N is  defined  similarly.  Let  QSOR(G , A,  b,  />.  a:)  denote  the  result  of  the 
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Fig.  3.  The  left  figure  shows  the  result  of  QSOR  after  5000  iterations.  The  right  figure 
shows  the  result  of  (1000)  multigrid  QSOR  with  one  coarsing  step  (i.e.  the  right  figure 
is  “cheaper”  than  2000  QSOR  streps). 

QSOR  method  applied  to  the  problem  on  the  grid  G after  p iterations  on  the  Grid  G 
with  starting  vector  x and  constraints  defined  by  A,  b.  Additionally  let  Fk  be  given. 

• Multigrid  QSOR 

* while  precision  not  achieved 
o F = QSOR(G,A,b,p,Fk) 
o F = IcQSOR(Gc,  Ac,  be,  n,  ICF) 
o Fk+1  = QSOR{G,  A, b,p,F) 
o k <—  k + 1 

where  Ac,  bc  are  the  corresponding  constraints  for  the  coarser  grid.  The  question  is  now 
how  to  define  the  projection  of  the  constraints.  One  can  think  of  an  example  where 
the  canonical  projection  of  bounds  like  Gc  can  fail.  This  happens  for  example  if  strong 
constraints  (e.g.  tight  bounds)  are  not  on  the  coarse  grid.  To  overcome  this  problem  one 
has  to  think  of  a method  of  defining  the  problem  QP3  on  the  coarser  grid  in  a reasonable 
way.  One  way  to  handle  this  problem  is  to  define  Lij  :=  max{Lfc|i.,_i  < k < ij+i}  and 
“min”  for  the  upper  bounds.  Special  cases  have  to  be  treated  but  we  do  not  go  into 
details  here.  A coarser  grid  means  that  the  QSOR  step  on  this  grid  converges  much 
faster.  On  the  other  hand  the  approximation  of  the  solution  gets  worse  by  coarsening 
the  grid.  In  our  case  (see  Figure  4)  we  have  n — 2048.  The  coarsest  grid  was  made  by 
taking  every  eighth  gridpoint.  We  iterated  until  there  was  no  recognizable  improvement. 


6 Proofs 

Proof  of  Theorem  3.1:  We  set  D = F2  — F\.  One  simply  verifies  that  D has  to  be  a 
line,  i.e.,  there  are  a,  b £ R such  that  D,t  — ati  + b.  Touching  three  bounds  alternately 
means  that  D changes  its  sign  at  least  two  times  which  leads  to  D = 0.  □ 
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Fig.  4.  Multigrid  QSOR  applied  to  the  doppler  data  with  n = 2048.  The  figure  took 
less  than  6 seconds  comparing  to  three  hours  without  multigrid  on  the  same  computer. 

Proof  of  Theorem  4.1:  We  set  Q = Q3.  We  have  to  show: 

1)  (S3(Fk))keNo  decreases; 

2)  ( Fk)ken  is  feasible; 

3)  If  Fs  is  a stationary  point  of  QSOR,  then  Fs  minimizes  S3  in  the  feasible  set. 

• our  feasible  set  is  compact,  so  the  sequence  has  a convergent  subsequence, 

• a limit  of  a subsequence  of  {Fk)’gL1  is  a stationary  point  of  QSOR, 

• the  problem  has  only  one  solution. 

To  the  first  point,  we  only  remark  that  a,  6 as  defined  in  the  algorithm,  are  the  minimizers 
of  the  term: 

(u+k  1 /+k  \ \ ^ / / v+k  v+k 

x ^2  tie i + y Y2  Ud  ] j Q ( 2 - I X tjef  + y 

i~v  i=u  / / \ \ i=F  i=v 

The  others  are  treated  as  in  [6] . The  second  point  is  clear,  because  by  the  definition  we 
start  with  a feasible  vector  and  we  retain  the  feasibility  in  every  single  step.  It  remains 
to  show  the  third  point.  Let  Fs  be  a stationary  point  of  the  algorithm.  It  is  sufficient  to 
show  that  ( QFS,Z  - Fs)  > 0 for  all  feasible  vectors  z (see  [6]),  where  (,)  denotes  the 
standard  inner  product  in  Mn.  To  show  this  we  first  note  that  Q = DTQiD,  where 

/ 1 \ 


V -11/ 

and  Q-2  is  the  matrix  according  to  QP3,  i.e.,  to  the  direct  problem.  So  we  can  deduce 
that  ( QFS , Z-FS)  = (Z-  FS)TQFS  = (Z  - FS)TDTQMDFS  = ( QMfs,z  - fs)(fs  := 
DFs,z  :=  DZ).  Now  we  only  have  to  look  at  the  “active  points”  because  ( QFs)i  is 
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zero  everywhere  else.  Let  Z be  an  arbitrary  feasible  vector  and  j be  an  index  with 
Z?  = Lj  and  (Qz)j  ^ 0,  so  — u>(QFs ) < 0.  With  the  feasibility  of  Z,  it  follows  that 
(QFs)j{Zj  — Zj)  — ( QFs)j(Zj  — Lj)  > 0.  With  the  same  argument  we  can  derive 
(QFs)j(Zj  — ZSj)  > 0 if  Fs  touches  the  upper  bound.  It  remains  to  show  the  inequality 
for  the  linearity  intervals.  Let  [tv,t„+k]  be  a linearity  interval  of  Fs.  Then  obviously 
[tv+i,tv+k]  is  a constancy  interval  for  Fs.  Furthermore  it  follows  from  the  stationarity 
of  Fs  that  a,  b as  defined  in  the  algorithm  are  zero.  This  is  equivalent  to 

v+k  v+k  1 v-\-k 

j2(qfs)= o,  («* =*/»), 

i=v  i=if  i—i/ 

which  implies  that 

i i i 

= ( Qmx)u 

i= 1 i= v i—  1 

for  arbitrary  leR"  and  x = DX.  So  our  conditions  are 

v-\-k  v+k 

(QMFs)„  = 0,  ^2(QMFs)i  = o £ ( QmF s)i  = o. 

i= v i=i/+ 1 

This  case  was  proved  by  Lowendick  [6],  □ 
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